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Quantitative geneticists long ago recognized the value of studying evolu- 
tion in a multivariate framework (Pearson, 1903). Due to linkage, pleiotropy, 
coordinated selection and mutational covariance, the evolutionary response 
in any phenotypic trait can only be properly understood in the context of 
other traits (Lande, 1979; Lynch and Walsh, 1998). This is of course also 
well-appreciated by comparative biologists. However, unlike in quanti- 
tative genetics, most of the statistical and conceptual tools for analyzing 
phylogenetic comparative data (recently reviewed in Pennell and Harmon, 

2013) are designed for analyzing a single trait (but see, for example Rev- 
ell and Harmon, 2008; Revell and Harrison, 2008; Hohenlohe and Arnold, 
2008; Revell and Collar, 2009; Schmitz and Motani, 2011; Adams, 2014b). 
Indeed, even classical approaches for testing for correlated evolution be- 
tween two traits (e.g., Felsenstein, 1985; Grafen, 1989; Harvey and Pagel, 
1991) are not actually multivariate as each trait is assumed to have evolved 
under a process that is independent of the state of the other (Hansen and 
Orzack, 2005; Hansen and Bartoszek, 2012). As a result of these limitations, 
researchers with multivariate datasets are often faced with a choice: ana- 
lyze each trait as if they were independent or else decompose the dataset 
into statistically independent set of traits, such that each set can be ana- 
lyzed with the univariate methods. 

Principal components analysis (PCA) is the most common method for 
reducing the dimensionality of the dataset prior to analyzing the data us- 
ing phylogenetic comparative methods. The first PC axis is the eigenvector 
in the direction of greatest variance, the second PC axis, the second greatest 
variance, and so on. However, standard methods for calculating PC scores 
assume that the samples are independent of one another, which is hardly 
ever the case for comparative data. As a result of shared common ancestry, 
relatives are likely to share many traits and trait combinations. Performing 
comparative analyses without considering the species' evolutionary rela- 
tionships is anathema to most evolutionary biologists, but it is less well- 
appreciated that phylogeny should be considered when transforming data 
(Revell, 2009; Polly et al., 2013). 

Standard PCA continues to be regularly used in comparative biology. 
This is applied to a variety of types of traits including geometric morpho- 
metric landmarks (e.g., Dornburg et al., 2011; Hunt, 2013), measurements 
of multiple morphological traits (e.g., Harmon et al., 2010; Bergmann and 
Irschick, 2012; Weir and Mursleen, 2013; Pienaar et al., 2013; Price et al., 

2014) , and climatic variables (e.g., Kozak and Wiens, 2010; Schnitzler et al., 
2012). We stress that the papers that we have cited here are simply ex- 
amples selected from a substantial number of papers where standard PCA 
was used. 

The most frequently used approach for correcting for the non-independence 
of species is to assume a phylogenetic model for the evolution of measured 
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traits and incorporate the expected covariance in the calculation of the PC 
axes and scores (Revell, 2009). Revell's method (explained in detail below) 
assumes that the measured traits have evolved under a multivariate Brow- 
60 nian motion (BM; Edwards and Cavalli-Sforza, 1964) model of trait evo- 
lution. In a brief simulation study, Revell (2009) demonstrated that if the 
underlying model for the traits was indeed a multivariate BM model, per- 
forming standard PCA gave biased estimates of the eigenvalues, whereas 
pPCA did not. 

65 In this paper, we first extend the argument of Revell (2009) and demon- 
strate how biased eigenvalues obtained from PCA systematically distort 
biological inference in a predictable manner. Performing comparative anal- 
yses on standard PC axes can therefore positively mislead inference. This 
point has been made in other fields that deal with auto-correlated data, 

70 such as population genetics (Novembre and Stephens, 2008), ecology (Po- 
dani and Miklos, 2002), climatology (Richman, 1986) and paleobiology 
(Bookstein, 2012). However, the connection between these previous results 
and phylogenetic comparative data has not been widely appreciated and 
standard PCs continue to be regularly used in the field. We hope that our 

75 paper helps change this practice. 

Second, as stated above, Revell (2009) assumed that the measured traits 
had evolved under a multivariate BM process. As the pPC scores are not 
phylogenetically independent (Revell, 2009; Polly et al., 2013, see below), 
one must use comparative methods to analyze them which will in turn re- 

80 quire selecting an evolutionary model for the scores. The choice of model 
for the traits and the pPC scores are separate steps in the analysis (Rev- 
ell, 2009). This has the potential to introduce an odd circularity into the 
analysis: it seems probable that the choice of a model for the evolution of 
the traits will influence the apparent macroevolutionary dynamics of the 

85 resulting pPC scores. To our knowledge this effect has not been previ- 
ously explored. Here we analyze simulated data to investigate whether 
assuming a BM model for the traits introduces systematic biases in the 
pPC scores when the generating model is different. Furthermore, we an- 
alyze two real comparative datasets — wherein the traits almost certainly 

90 have not evolved via a strict BM process - to understand the implications 
of these results for the types of data that researchers actually have. 

Last, we consider the interpretation of evolutionary models fit to pPC 
axes and discuss the conceptual and statistical advantages and disadvan- 
tages of using pPCA compared to alternative approaches for studying mul- 
95 tivariate evolution in a phylogenetic comparative framework. We argue 
that the statistical advantages of using pPC axes comes at a substantial 
conceptual cost and that alternative techniques are likely to be much more 
informative for addressing many evolutionary questions. 
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Methods 

Overview ofpPCA 

Before describing our analyses, we briefly overview standard and phyloge- 
netic PCA and highlight the differences between the two. In conventional 
PCA, am x m covariance matrix R is computed from a matrix of trait values 
X for the n species and m traits 



where }i is a vector containing the means of all m traits and 1 is a column 
vector of ones. We note that in many applications X may not represent the 
raw trait values; in geometric morphometries, for example, size, translation 
and rotation will often be removed from X prior to computing R (Rohlf 
and Slice, 1990; Bookstein, 1997). The eigenvalues D and eigenvectors V of 
R are then obtained using singular-value decomposition R = VDV" 1 or 
some related technique. The scores S, the trait values of the species along 
the PC axes, are computed as 



Phylogenetic PCA differs from this procedure in two important ways 
(Revell, 2009; Polly et al., 2013) . First the covariance matrix is inversely 
weighted by the expected covariance of trait values between taxa under a 
given model E. Under a BM model of trait evolution, L is simply propor- 
tional to the matrix representation of the phylogenetic tree C, such that 
Ej ; is the shared path length between lineages i and /. (We note that as 
only relative branch lengths matter, under a multivariate BM process, we 
can simply set L = C without loss of generality.) Including the expected 
covariance between trait values essentially just re-orients the axes accord- 
ing to the phylogeny. Second, the space is centered on the "phylogenetic 
means" a of the traits rather than their arithmetic means, which can be 
computed following Revell and Harmon (2008): 




(1) 




(2) 



a = [(1TE- 1 1)- 1 1TE" 1 X]T. 



(3) 



In pPCA, Equation 1 is therefore modified as 



R = (n — 1)-\X - la T ) T L _1 (X - la T ) 



(4) 



Similarly, S can be calculated for pPCA using Equation 2 but substituting 
the phylogenetic means for the arithmetic means 
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where again, V is a matrix containing the eigenvectors of R (in this case 
obtained from Equation 4). 

130 The effect of weighting the covariance and centering the space using 
phylogeny has an important statistical consequence (Revell, 2009; Polly 
et al., 2013). In PCA, each PC score is independent of all other scores 
from the same PC axis and from scores on other axes. Due to the phylo- 
genetic structure of the data, this property of independence does not hold 

135 when using pPCA. Therefore it is necessary to analyze pPC scores using 
phylogenetic comparative methods, just as one would for any other trait 
(Revell, 2009; Polly et al., 2013). 

Effect of PCA on model selection under multivariate Brownian 
motion 

140 We simulated 100 replicate datasets under multivariate Brownian motion 
to evaluate the effect of using standard versus phylogenetic PCA to infer 
the mode of evolution. For each dataset, we used TreeSim (Stadler, 2011) to 
simulate a phylogeny of 50 terminal taxa under a pure-birth process and 
scaled each tree to unit height. We then simulated a 20-trait dataset under 

145 multivariate Brownian motion. When analyzing phylogenetic comparative 
data, R is estimated from the data; here, we set R to be a known quantity 
and use it to simulate phylogenetically structured trait data. For each sim- 
ulation, we generated a positive definite covariance matrix for R, by draw- 
ing eigenvalues from an exponential distribution with a rate A = 1 / 100 

150 and randomly oriented orthogonal eigenvectors. We then used this matrix 
to sample a covariance matrix for the tip states X ~ A/"(0, R <g> C) where 
® is the Kronecker product. For each of the 100 simulated datasets, we 
computed PC scores using both standard methods and pPCA (using the 
phytools package; Revell, 2012). We used phylolm (Ho and Ane, 2014) to 

155 fit models of trait evolution to the original data and to all PC scores ob- 
tained by both methods. Following Harmon et al. (2010), we considered 
three models of trait evolution: 1) BM; 2) Ornstein-Uhlenbeck with a fixed 
root (OU: Hansen, 1997); and 3) Early Burst (EB: Blomberg et al., 2003; Har- 
mon et al., 2010). We then calculated the Akaike weights (AICw) for each 

160 model / transformation / trait combination. 

To explore the effect of trait correlation on inference, we conducted an 
additional set of simulations where R was varied from the above simu- 
lations to result in more or less correlated sets of phenotypic traits. We 
drew eigenvalues m from an exponential distribution and scaled these 
165 so that the leading eigenvalue m 1 was equal to 1. We then exponenti- 
ated this vector across a sequence of exponents ranging for Ci to ^>i; 
this gave us a series of covariance matrices ranging from highly correlated 
(m 1 = i;m 2 , . . .,m 20 ~ o) to nearly independent (m pa 1), respectively. We 
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chose the series of exponents to obtain a regular sequence of m 1 / Y4L1 m i 
170 from 0.05 to 1. For each set of eigenvalues, we simulated 25 datasets and 
estimated the slope of the relationship between the absolute size of phylo- 
genetically independent contrasts (Felsenstein, 1985) and the height of the 
node at which they were calculated (i.e., the "node height test" of Freck- 
leton and Harvey, 2006). Under OU models, this relationship is expected 
175 to be positive, while under EB models this relationship is negative. BM 
models are expected not to show correlation between contrasts and height 
of the nodes. 

Effect of using PCA when traits are not Brownian 

We then simulated datasets under alternative models of trait evolution. Be- 
180 cause of difficulties in efficiently simulating large multivariate datasets of 
covarying traits under OU or EB models, we instead simulated 20 indepen- 
dent traits under BM, OU and EB for 50 taxa trees (as above, but setting all 
eigenvalues equal to one another). Of course, this is certainly not represen- 
ative of the process that have shaped real multivariate data, but considering 
185 this simple case allowed us to investigate how misspecifying the model of 
trait evolution can impact analyses under the simplest scenario. 

For the BM simulations, we set a 2 = 1. For OU, we set a 2 = 1 and 
oc = 2, such that the phylogenetic half-life log(2)/o: (Hansen et al., 2008) 
was equal to ~ 0.35 of the total tree depth. For EB, we again set a 2 = 1 and 
190 set a, the exponential rate of deceleration, to be log(o.02). 

As above, we fit BM, OU and EB models to the original data, PC scores 
and pPC scores for each simulated dataset and estimated parameters and 
AICw. In addition to the model-fitting and comparison, for every trans- 
formation, we applied two common diagnostic tests for deviation from 
195 BM-like evolution to all trait/PC axes. First, we calculated the slope of the 
node height test as described in the preceding section. Second, we charac- 
terized the disparity through time (Harmon et al., 2003) using the geiger 
package (Pennell et al., in press). 

Finally, we examined the scenario in which a set of traits each follow a 
200 model of evolution with unique evolutionary parameters. In particular, we 
use the accelerating-decelerating (ACDC) model of Blomberg et al. (2003) to 
generate independent trait datasets. This model is a general case of the EB 
model which allows both accelerating or decelerating rates of phenotypic 
evolution. Accelerating rates of evolution result in identical likelihoods 
205 as the OU model (assuming the root state is at the optimal trait value), 
and thus are equivalent for our purposes. We simulated 100 datasets with 
50 taxa and 20 traits. Trees were generated as in previous simulations. 
Each trait was simulated along the phylogeny with an ACDC parameter 
(a) drawn from a normal distribution with mean o and standard deviation 
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210 of 5. Values of a above o correspond to accelerating evolutionary rates, 
while those below o correspond to decelerating, or Early-Burst models of 
evolution. For each dataset, we conducted both standard and phylogenetic 
PCA in which the traits are standardized to unit variance (i.e., using cor- 
relation matrices, which ensured traits across parameter values had equal 

215 expected variances). For each PC or pPC, we regressed the magnitude of 
the trait loadings against the trait's ACDC parameter value. We then vi- 
sualized whether there were systematic trends in the relationship between 
the ACDC parameter value, and the weight given to a particular trait across 
PC axes. Such systematic trends would indicate that either PCA or pPCA 

220 "sorts" traits into PC axes according to the particular evolutionary model 
each trait follows. 

Empirical examples 

We analyzed two comparative datasets assembled from the literature, al- 
lowing us to investigate the effects of principal components analyses on re- 

225 alistically structured data. First, we analyzed phenotypic evolution across 
the family Felidae (cats) using measurements from two independent sources 
— five cranial measurements from Slater and Van Valkenburgh (2009) and 
body mass and skull width from Sakamoto et al. (2010). For the analy- 
sis, we used the supertree compiled by Nyakatura and Bininda-Emonds 

230 (2012). Second, we analyzed 23 morphometric traits in Anolis lizards and 
phylogeny from Mahler et al. (2010). In both datasets, all measurements 
were linear measurements on the logarithmic scale. We conducted stan- 
dard and phylogenetic PCA and examined the effect of each on model- 
fitting, the slope of the node height test, and the average disparity through 

235 time. All simulations and analyses were conducting using R V3.0.2 (R De- 
velopment Core Team, 2013). Scripts to reproduce all analyses are available 
at https : //github . com/mwpennell/phyloPCA. 

Results 

Effect of PCA on model selection under multivariate Brownian 
motion 

Standard PCA introduces a systematic bias in the favored model across 
principal components. In our simulations where the traits evolved under 
a multivariate BM model, EB models had systematically elevated support 
as measured by Akaike weights for the first few components, for which 
245 it generally exceeded support for the BM model (Figure 1). Fitting mod- 
els sequentially across PC axes 1-20 revealed a regular pattern of increas- 
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ing support for BM models moving from the first toward the intermediate 
components, followed by increasing support for OU models among later 
components (which generally approached an AICw of 1). This regular pat- 
tern across trait axes was not present for either the original datasets, or for 
phylogenetic principal components, which found strong support for the 
BM model regardless of which trait was analyzed. We note that the theo- 
retical maximum AICw for the BM model in the three-model comparison 
is 1/ (2e _1 + 1) « 0.576, as BM is a special case of both OU and EB and 
therefore the A AICw for these models cannot be greater than 2. 

Multivariate datasets simulated with high correlations (low effective di- 
mensionality) showed increased support for BM across PC axes. When 
the leading eigenvalue explained a large proportion of the variance, the 
slope of the node height test converged toward o, indicating no system- 
atic distortion of the contrasts through time (Figure 2). However, when the 
eigenvalues of the rate matrix were more even, standard PCA resulted in 
a negative slope in the node height test among the first few PCs, which 
in turn provides elevated support for EB models. This pattern is reversed 
among higher PC axes, which have positive slopes between node height 
and absolute contrast size, which provides elevated support for OU mod- 
els (Figures 2 and 4). 

Effect of using PCA when traits are not Brownian 

If the underlying model was either OU or EB rather than BM, then phy- 
logenetic PCA tended to increase support for the true model relative to 
the original trait variables for the first few component axes (Figures 3, Si, 
and S2). For example, when each of the original trait variables were simu- 
lated under an OU process, support for the OU model increased for pPCi 
relative to the original trait variables. Higher principal component axes in- 
ferred a regular pattern of decreasing support for the OU model, while the 
last few PCs have equivocal support for either a BM or OU model (Figure 
3). Furthermore, parameter estimation was affected by phylogenetic PCA. 
The oc parameter of the OU model was estimated to be stronger than the 
value simulated for individual traits for the first few pPC scores and lower 
for the higher components (Figure S3). 

Examining the outcomes of the node height tests (Figure 4) and the dis- 
parity through time analyses (Figure 5) help clarify the results we observed 
from model comparison and parameter estimation. Under OU models, 
traits are expected to have the highest contrasts near the tips, whereas un- 
der EB models, traits will have the highest contrasts near the root of the 
tree. Under multivariate BM, standard PCA maximizes the overall vari- 
ance explained across the entire dataset, thereby tending to select linear 
combinations of traits that maximize the contrasts at the root of the tree. 
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Thus, the first few PCs are skewed toward resembling EB models, while the 
last few PCs are skewed toward resembling OU models. By contrast, the 
effect of pPCA on the node height relationship depends on the generating 
model. When traits are evolved under an OU model, the first few pPC axes 
show an exaggerated pattern of high variance towards the tips. Likewise, 
when traits are evolved under an EB model, the first few pPC axes show an 
exaggerated pattern of high variance towards the root of the tree. For traits 
generated under both OU and EB models, the higher components resemble 
BM-like patterns. 

When the data includes traits evolved under different parameters, both 
PCA and pPCA resulted in the systematic assignment of traits with par- 
ticular parameter values to each PCA axes. Traits which follow EB models 
are preferentially given higher loadings under both PCA and pPCA for the 
first few PCs as well as the last few PCs (Figure 6). Furthermore, both PCA 
and pPCA assign fewer PCs with higher loadings to traits that follow EB 
models, whereas the intermediate PCs have more even loadings slightly 
skewed toward accelerating rates (i.e., OU-like models). This asymmetry 
may reflect the fact that EB models are more variable in their outcomes 
to the phylogeny, owing to the fewer independent branches among which 
divergence can occur closer to the root of the tree. Our results indicate that 
both pPCA and PCA can be biased in the selection of PC axes with respect 
to the generating evolutionary model. 

Empirical examples 

In the Felid dataset, the seven morphometric traits were extremely highly 
correlated, with the first PC explaining 96.9% and 93.7% of the total vari- 
ation in the dataset for standard PCA and phylogenetic PCA, respectively. 
All raw traits and the first PC axis of both standard and phylogenetic PCA 
support a BM model of evolution (PC and pPC axes have AICw's of 0.574, 
which is near the theoretical maximum for BM). The last four standard 
PC axes show strong support for a OU model (AICw ~ 1) whereas un- 
der phylogenetic PCA the last axes have mixed support favouring BM or 
OU (Figure S4). Both the node height test and the disparity through time 
plots show this same pattern. The node height slope of the first axis is ap- 
proximately zero while the slope of the remaining axes are slightly positive 
under standard and phylogenetic PCA. The first axis show the same dispar- 
ity through time pattern of the untransformed data in both standard and 
phylogenetic PCA. However, the last PC axes show disparity accumulated 
toward the tips under standard PC, while phylogenetic PCA produced a 
less clear pattern (Figure S5). 

For the morphometric traits in the Anolis dataset, the first PC also ex- 
plained a large proportion of the variation (92.6% and 90.0% for standard 
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and phylogenetic PC A, respectively). Most of the untransformed traits had 
equivocal support for either a BM or EB model (Figure 7). While PCi of 
both PCA and pPCA mirrored this pattern, the remaining PCs for both 
PCA and pPCA show a general pattern of decreasing support for an EB 
model (Figure 7). Collectively PCs 2-4 had higher support for the EB model 
than any other PC in both standard PCA (AICw£g: PC2 = 1.0; PC3 = 0.47; 
PC4 = 0.28) and phylogenetic PCA (AlCw^g: pPC2 = 1.0; PPC3 = 0.43, 
PPC4 = 0.27). Similarly, these early PC axes tended to have more negative 
slopes from the node height test relative to the average trait in the dataset 
(Figure S6). 



Discussion 

340 Different ways of representing the same set of data can change the mean- 
ing of measurements and alter the interpretations of subsequent statistical 
analyses (Houle et al., 2011). PCA is often considered to be a simple linear 
transformation of a multivariate dataset and the potential consequences of 
performing phylogenetic comparative analyses on PC scores has received 

345 very little attention. In this paper, we sought to highlight the fact that fitting 
macroevolutionary models to a handful of PC axes may positively mislead 
inference — what appears like the signal of an interesting biological pro- 
cess may simply be an artifact stemming from how PCA is computed. By 
focusing analyses exclusively on the first few PC axes, as is commonly done 

350 in comparative studies, researchers are, in effect, taking a biased sample of 
a multivariate distribution. We demonstrate how these biased samples can 
influence inference in both PCA and pPCA — particularly toward inferring 
decreasing rates of evolution in highly dimensional datasets. 

We can obtain an intuitive understanding of how inference can be af- 
355 fected by PCA by considering data simulated under a multivariate BM 
model. Despite a constant rate of evolution across each dimension of trait 
space, stochasticity will ensure that some dimensions will diverge more 
rapidly than expected early in the phylogeny, while others will diverge 
less. All else being equal, dimensions that happen to diverge early are 
360 expected to have the greatest variance across species, and standard PCA 
will identify these axes as the primary axes of variation. However, the trait 
combinations that are most divergent early, will have apparent slowdowns 
towards the present simply due to "regression toward the mean", result- 
ing in the characteristic "early burst" pattern of evolution for the first few 
365 principal components (for a related point in the context of models of lin- 
eage diversification, see Pennell et al., 2012). An analogous process will 
result in the last few PCs following an OU process, in which the amount 
of divergence will appear to increase toward the present. Standard PCA 
thus "sorts" orthogonal trait dimensions by whether they follow EB, BM 
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and finally, OU like patterns of trait divergence. Thus, traits studied using 
PCA may often be biased to reflect particular evolutionary models, merely 
as a statistical artifact. 

These problems, which ultimately stem from using PCA on auto-correlated 
data, are not limited to phylogenetic comparative studies (see Richman, 
1986; Podani and Miklos, 2002; Novembre and Stephens, 2008; Jolliffe, 2002; 
Bookstein, 2012). For example, Novembre and Stephens (2008) demon- 
strated that apparent waves of human migration in Europe obtained from 
PCA of genetic data (e.g., Cavalli-Sforza et al., 1994) could be attributed 
to artifacts similar to those we document here (in their case, the auto- 
correlation was the result of geography rather than phylogeny). While the 
bias introduced by applying standard PC to comparative data has been 
documented previously (Revell, 2009; Polly et al., 2013), we sought to clar- 
ify precisely how inferences of macroevolutionary processes and patterns 
can be impacted by this bias. 

Revell (2009) recognized this biased selection of eigenvectors and intro- 
duced pPCA as a means for accounting for the phylogenetic correlation 
of data points. Our simulations verify that when the underlying model 
is multivariate BM, phylogenetic PCA mitigates the biased selection of PC 
axes by scaling divergence by the expected divergence given the branch 
lengths of the phylogeny. However, BM is often a poor descriptor of the 
macroevolutionary dynamics of trait evolution (for example, see Harmon 
et al., 2010; Pennell et al., 2014) and assuming this model when performing 
pPCA is less than ideal. Revell (2009) suggested that alternative covariance 
structures could be used to estimate phylogenetically independent PCs for 
different models. For example, one could first optimize the A model (Pagel, 
1999) across all traits simultaneously and then rescale the branch lengths of 
the tree according to the estimated parameter in order to obtain E for use in 
Equation 4. However, one cannot compare model fits across alternative lin- 
ear combinations of traits, so the data transformation must occur separately 
from modeling the evolution of the PC axes. As Revell noted, parameters 
estimated to construct the covariance structure for the pPCA will likely be 
different from the same parameters estimated using the PC scores them- 
selves. Furthermore, this procedure is restricted to models that assume a 
shared mean and variance structure across traits (see Hansen et al., 2008; 
Bartoszek et al., 2012, for examples where this does not apply). As such, if 
the question of interest relies on model-based inference, transformation of 
trait data using pPCA will require simplifying and rather ad-hoc assump- 
tions, and researchers must hope that the resulting inferences are generally 
robust to these decisions. 

We show that when the trait model is misspecified, systematic and pre- 
dictable distortions occur across pPC axes — similar to those that were 
observed when the phylogeny was ignored altogether. In some scenar- 
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ios, such distortions may not substantially alter inference. For example, 
when all traits evolve under a OU model (or when all traits evolve un- 
der a EB model), we find that these distortions primarily serve to inflate 
the support for the true model. Even so, interpretation of parameter esti- 
mates for pPC scores becomes much more challenging (Figures 4, 5, and 
S3). However, more complex scenarios can produce more worrying distor- 
tions. When evolutionary rates vary through time and across traits, both 
PCA and pPCA will sort traits into PC axes according to which evolution- 
ary model they follow, despite all traits being evolutionarily independent. 
Under the conditions we examined, this resulted in both PCi and pPCi be- 
ing heavily-weighted toward EB-type models, despite simulating an even 
distribution of accelerating and decelerating rates across traits. Sugges- 
tively, we observe similar patterns for both PCA and pPCA in the Anolis 
morphometric dataset (Figures 7 and S6). Focusing on the first few axes 
of variation identified by pPCA alone may skew our view of evolutionary 
processes in nature, and bias researchers toward finding particular patterns 
of evolution. 

When employed as a descriptive tool, PCA can be broadly used even 
when assumptions regarding statistical non-independence or multivariate 
normality are violated (Jolliffe, 2002). There is nothing wrong with using 
standard PCA or pPCA on comparative data to describe axes of maxi- 
mal variation across species or for visualizing divergence across phylo- 
morphospace (Sidlauskas, 2008). Furthermore, our simulations and em- 
pirical analyses suggest that strong correlations among traits (i.e., when 
the leading eigenvector explained a majority of the variation, e.g., > 75%), 
PC scores may not be appreciably distorted (Figure 2). The statistical arti- 
facts we describe are more likely to appear when matrices have high effec- 
tive dimensionality (see Bookstein, 2012). Given that many morphometric 
datasets may be highly correlated, the overall effect of using PCA or of 
misspecifying the model in phylogenetic PCA may in some cases be rela- 
tively benign. And we certainly do not mean to imply that the biological 
inferences that have been made from analyzing standard or phylogenetic 
PC scores in a comparative framework are necessarily incorrect. When 
Harmon et al. (2010) analyzed the evolution of PC2 (what they referred 
to as "shape") obtained using standard PCA, they found very little sup- 
port for the EB model across their 39 datasets. The magnitude of the bias 
introduced by using standard PCA is difficult to assess but any bias that 
did exist would be towards finding EB-like patterns. This only serves to 
strengthen their overall conclusion that such slowdowns are indeed rare 
(but see Slater and Pennell, 2014). However, our results do suggest that in 
some cases analyses conducted with PC axes should be revisited to ensure 
that results are robust. 

The broader question raised by our study is how one should draw evo- 
lutionary inferences from multivariate data. The first principal component 
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axis from pPCA is the major axis of divergence across the sampled lineages 
in the clade (also known as the "line of divergence"). This axis is of con- 
siderable interest in evolutionary biology. The direction of this line of di- 

4 6o vergence may be affected by the orientation of within-population additive 
genetic (co)variance G, such that evolutionary trajectories may be biased 
along the "genetic line of least resistance" (i.e., divergence occurs primar- 
ily along the leading eigenvector of G, g max ; Schluter, 1996). Alternatively 
the line of divergence may align with co max , the "selective line of least re- 

465 sistance", due to the structure of phenotypic adaptive landscapes (Arnold 
et al., 2001; Jones et al., 2007; Arnold et al., 2008), or else may be driven by 
patterns of gene flow between populations (Guillaume and Whitlock, 2007) 
or the pleiotropic effects of new mutations (Jones et al., 2007; Hether and 
Hohenlohe, 2014). 

470 While it is perfectly sensible and interesting to compare the orienta- 
tion of pPCi to that of gmax/ Wmax or other within-population parameters, 
making explicit connections between macro- and microevolution requires a 
truly multivariate approach. Quantitative genetic theory (e.g., Lande, 1979; 
Lynch and Walsh, 1998) makes predictions about the overall pattern of evo- 

475 lution in multivariate space. By fitting evolutionary models to pPC scores, 
we are only considering evolution along these axes independently and are 
therefore missing most of what is going on. In contrast, multivariate tests 
for the correspondence of axes of trait variation within and between species 
can provide meaningful insights into the long-term determinants of evo- 

480 lutionary change, and the process by which traits evolve (Hohenlohe and 
Arnold, 2008; Bolstad et al., 2014). 

The most conceptually straightforward multivariate approach for an- 
alyzing comparative data is to construct models in which there is a co- 
variance in trait values between species (which is done in univariate mod- 

485 els) and a covariance between different traits. Such multivariate exten- 
sions of common quantitative trait models have been developed (Butler and 
King, 2004; Revell and Harmon, 2008; Hohenlohe and Arnold, 2008; Revell 
and Collar, 2009; Thomas and Freckleton, 2012). These allow researchers 
to investigate the connections between lines of divergence and within- 

490 population evolutionary parameters (Hohenlohe and Arnold, 2008) as well 
as to study how the correlation structure between traits itself changes 
across the phylogeny (Revell and Collar, 2009). 

However, these approaches also have substantial drawbacks. First, the 
number of free parameters of the models rapidly increases as more traits 
495 are added (Revell and Harmon, 2008), making them impractical for large 
multivariate datasets. This issue may be addressed by constraining the 
model in meaningful ways (Butler and King, 2004) or by assuming that 
all traits (or a set of traits) share the same covariance structure (Klingen- 
berg and Marugan-Lobon, 2013; Adams, 20i4b,a). Such restrictions of pa- 
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5 oo rameter space are especially appropriate for truly high-dimensional traits, 
such as shape inferred from geometric morphometric landmark data. For 
such traits, we are primarily interested in the evolution of the aggregate 
trait and not necessarily the individual components. The second drawback 
is that these models allow for inference of the covariance between traits 

505 but the cause of this covariance is usually not tied to specific evolution- 
ary processes. This difficulty can be addressed by explicitly modeling the 
evolution of some traits as a response to evolution of others. Hansen and 
colleagues have developed a number of models in which a predictor vari- 
able evolves via some process and a response variable tracks the evolution 

510 of the first as OU process (Hansen et al., 2008; Hansen and Bartoszek, 2012; 
Bartoszek et al., 2012). This has been a particularly useful way of modeling 
the evolution of allometries (e.g. Hansen and Bartoszek, 2012; Voje et al., 
2013; Bolstad et al., 2014). But, as with the "full covariance" models dis- 
cussed above, increasing the number of traits makes the model much more 

515 complex and parameter estimation difficult. 

As we can only estimate a limited number of parameters from most 
comparative datasets — and even when we consider large datasets, most 
existing comparative methods have only been developed for the univari- 
ate case — it often remains necessary to reduce the dimensionality of a 
520 multivariate dataset to one or a few compound traits. We believe that al- 
though PCA can be potentially quite usefully applied to this problem, it 
may be in ways that are statistically and conceptually distinct from how it 
is conventionally applied to comparative data. 

First, we argue that reducing multivariate problems to more easily man- 
525 aged, lower-dimensionality analyses should be approached with the spe- 
cific goal of maintaining biological meaning and interpretability (Houle 
et al., 2011). The common practice of examining only the first few PCs 
carries with it the implicit assumption that PCA ranks traits by their evo- 
lutionary importance and biological interest — a conclusion that may not 
53 o necessarily be the case (Polly et al., 2013). If a certain PC axis is of suffi- 
cient biological interest in its own right, it may not matter if it is a biased 
subset of a multivariate distribution. The fact that a vast majority of the 
traits studied in adaptive radiations likely represent very biased axes of 
variation across the multivariate process of evolution does not diminish 
535 the importance of the inferences made from studying these traits (Schluter, 
2000). 

The danger occurs when the biological significance of the set of traits 
is poorly understood, and when the source of the statistical signal may be 
either artifactual or biological. If a trait was not of interest a priori, then 
540 this essentially turns into a multiple comparisons problem in which PCA 
searches multivariate trait space for an unusual axis of variation, which 
tend to suggest models inconsistent with the generating multivariate pro- 
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cess as a whole. A posteriori interpretation of the PC axes by their loadings 
is something of an art — one must "read the tea leaves" to understand what 

545 these axes mean biologically. Even when a particular axis is correlated with 
a biological interpretation, it can be unclear whether the statistical signal 
supporting a particular inference results from the evolutionary dynamics 
of the trait of interest or if it is the result of statistical artifacts introduced 
by the imperfect representation of that trait by a PC axis. More rigorous 

55 o algorithms can be applied to identify subsets of the original variables that 
best approximate the principal components, which — although still biased 
— are frequently more interpretable (Cadima and Jolliffe, 2001; Somers, 
1986, 1989; Hausman, 1982; Vines, 2000; Jolliffe, 2002; Zou et al., 2006). An- 
other potential approach is to use principal components computed from 

555 within-population data, rather than comparative data. For example, if G 
(or failing that, the phenotypic variance-covariance matrix P) is available 
for a focal species, then the traits associated to the principal axes of vari- 
ation in that species can be measured across all species in the phylogeny. 
In other words, across species trait measurements can be projected along 

560 gmax- This alleviates the issues we discuss in this paper by estimating PCs 
from within-population data that is independent from the comparative 
data used for model-based inference. 

Of course, components defined by within-population variance struc- 
ture or by approximating principal components with interpretable linear 

565 combinations will not explain as much variance across taxa as standard 
PCA and will not necessarily be statistically independent of one another. 
However, the extra variance explained by the principal components of com- 
parative data may in fact include a sizeable amount of stochastic noise, 
rather than interesting biological trait variation (as we have shown in our 

570 simulations). Furthermore, while the traits identified by pPCA will be sta- 
tistically orthogonal, this is only true in the particular snapshot captured 
by comparative data and does not imply that they are evolving indepen- 
dently. The distinction between statistical and evolutionary independence 
is crucial (Hansen and Houle, 2008) but it is easy to conflate these concepts 

575 when the data has been abstracted from its original form. We argue that the 
added intepretability of carefully chosen and biologically meaningful trait 
combinations far outweighs the cost of some trait correlations or explaining 
less-than-maximal variation. 

Concluding remarks 

580 In this note, we sought to clarify some statistical and conceptual issues re- 
garding the use of principal components in comparative biology. We have 
shown that from a statistical standpoint, failing to consider the phylogeny 
when performing PCA can be positively misleading. And despite the de- 
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velopment of methods to correct for this, in our reading of the empirical 
585 literature, we have found this to be a common oversight. We have also 
demonstrated that misspecifying the model of trait evolution when con- 
ducting pPCA may influence the inferences we make from the pPC scores. 
We show that in some scenarios, pPCA may sort traits according to the 
particular evolutionary models they follow in a similar manner as stan- 
590 dard PCA — ignoring phylogeny altogether is, of course, a form of model 
misspecification. Consequently, we caution that the use of pPCA may bias 
inference toward identifying particular evolutionary patterns, which may 
or may not be representative of the true multivariate process shaping trait 
diversification as a whole. 

595 We hope that our paper provokes discussion about how we should go 
about analyzing multivariate comparative data. We certainly do not have 
the answers but argue there are some major theoretical limitations inherent 
in using PCA (phylogenetic or not) to study macroevolutionary patterns 
and processes. 
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PC axis 



Figure 1: Distribution of support for BM, OU and EB models when the 
generating model is a correlated multivariate BM model. Support for mod- 
els were transformed onto a linear scale by calculating an overall model 
support statistic: AICwou ~ AICweb- Thus high values support OU, low 
values support EB, and intermediate values near o indicate BM-like evo- 
lution. Models were fit to each replicated dataset for each of 20 different 
traits which were taken either from PC scores (blue line) or phylogenetic 
PC scores (green line). Shaded regions indicate the 25 th and 75*" quan- 
tiles of the model support statistic for 100 replicated datasets. The red line 
indicates the average model support statistic averaged over all 20 original 
trait variables. Note that EB models have higher Akaike weights for the 
first few PCs of standard PCA, and that later PCs subsequently favor BM 
and finally, OU models. No such bias is found across traits for either the 
original data or pPCA. 
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Figure 2: Effect of trait correlations on the slope of the node height test 
for PC scores (left) and pPC scores (right) under a multivariate BM model 
of evolution. The red line is the aggregated data for all 20 traits on the 
original (untransformed) scale. The intensity of the colors are proportional 
to the ranking of the PC or pPC axes, stronger lines represent the first axes. 
When the leading eigenvector explains very little variation in the data and 
the effective dimensionality is high, the slope of node height test increases 
from negative to positive across PC axes. This indicates that under stan- 
dard PCA, PCi has higher contrasts near the root of the tree, while later 
PCs have higher contrasts near the tips (resulting in the pattern of model 
support observed in Figure 1). As the amount of variance explained by 
the principal eigenvector increases, the slope of the node height test ap- 
proaches o. No such effect is found for phylogenetic PCA. 
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Figure 3: Distribution of support for BM, OU and EB models when the 
generating model is a uncorrelated multivariate OU model. Support for 
models were transformed into a linear scale by calculating an overall model 
support statistic: AICwou ~ AICweb- Thus high values support OU, low 
values support EB, and intermediate values near o indicate BM-like evo- 
lution. Models were fit to each replicated dataset for each of 20 different 
traits which were taken either from PC scores (blue line) or phylogenetic 
PC scores (green line). Shaded regions indicate the 25^ and 75 quan- 
tiles of the model support statistic for 100 replicated datasets. The red line 
indicates the average model support statistic averaged over all 20 original 
trait variables. Note that standard PCA results in Akaike weights that are 
skewed toward EB for the first few PCs of standard PCA, while and that 
later PCs subsequently favor OU models. By contrast, pPCA results in 
Akaike weights that are skewed toward stronger support for OU models 
relative to the original trait variables. 
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Figure 4: Relationship between the average phylogenetic independent con- 
trasts and the height of the node across 100 datasets simulated under either 
a BM (left), OU (middle) or EB (right) model of evolution. Contrasts were 
calculated for each of the 20 traits corresponding to either PC scores (top 
row) or pPC scores (bottom row). Each line represents a best-fit linear 
model to the aggregated data across all 100 replicate simulations. Red 
lines are aggregated over all 20 traits on the original data. The plots are 
oriented so that the left side of each panel corresponds to the root of the 
phylogeny, with time increasing tipward to the right. The intensity of the 
colors are proportional to the ranking of the PC or pPC axes, stronger lines 
represent the first axes. PCA results in a predictable pattern of increasing 
slope in the contrasts across PCs. By contrast, pPCA only has systematic 
distortions across pPC axes when the underlying model is not multivariate 
BM. When this occurs, the first few pPC axes tend to have more extreme 
slopes than the original data (but in the correct direction). 
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Figure 5: Disparity through time plots averaged across the 100 simulated 
datasets. The datasets were simulated under BM (left), OU (middle) or EB 
(right). The analyses were then performed on PC scores (top row) and pPC 
scores (bottom row). The average disparity through time of all 20 original 
trait variables is indicated by the red line. We fit a loess curve through the 
relative disparities for each trait/ transformation/ model combination. The 
plots are oriented so that the left side of each panel corresponds to the root 
of the phylogeny, with time increasing tipward to the right. The intensity 
of the colors are proportional to the ranking of the PC or pPC axes, stronger 
lines represent the first axes. As in Fig. 4, the first few axes from the PCA 
show a strong pattern of high disparity early in the clades' histories with 
the higher components showing seemingly higher disparity towards the 
present. Phylogenetic PCA corrects the distortion if the generating model 
is multivariate BM. However, if the generating model was not BM, the first 
few pPC axes tend to show an exaggerated pattern of disparity relative to 
the original traits. 
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Figure 6: Relationship between factor loadings and ACDC parameter (a) 
for PCA (left) and pPCA (right) across 100 simulated datasets. For each 
simulation a value of ct were drawn from a Normal distribution with mean 
= o and sd = 5. Boxplots indicate the distribution of the slope of a lin- 
ear model describing the relationship between the absolute factor loadings 
for a given PC and the magnitude of the ACDC parameter. A negative 
slope indicates that traits with decelerating rates of evolution tend to have 
higher loadings in that particular PC. Conversely, positive slopes indicate 
that traits with accelerating rates tend to have higher loadings. 
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Figure 7: Distribution of support for BM, OU and EB models for a 23- 
trait morphometric dataset taken from Mahler et al. (2010). Support is 
measured in Akaike weights across all original trait variables (left), as well 
as standard PCA (middle) and pPCA (right). For both PCA and pPCA, 
support for the EB model appears to be concentrated in PCs 1-4, with a 
suggestive pattern of decreasing support across PCs 2-4. 
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Figure Si: Distribution of support for BM, OU and EB models when the 
generating model is an uncorrelated multivariate BM model. Support for 
models were transformed into a linear scale by calculating an overall model 
support statistic: AICu>ou ~ AICweb- Thus high values support OU, low 
values support EB, and intermediate values near o indicate BM-like evo- 
lution. Models were fit to each replicated dataset for each of 20 different 
traits which were taken either from PC scores (blue line) or phylogenetic 
PC scores (green line). Shaded regions indicate the 25^ and 75 quan- 
tiles of the model support statistic for 100 replicated datasets. The red line 
indicates the average model support statistic averaged over all 20 original 
trait variables. Note that EB models have higher Akaike weights for the 
first few PCs of standard PCA, and that later PCs subsequently favor BM 
and finally, OU models. No such bias is found across traits for either the 
original data or pPCA axes. 



30 



Downloaded from http://biorxiv.org/ on September 18, 2014 



PCA 

Phylogenetic PCA 



~l I I I I I I I I I I I I I 
7 8 9 10 11 12 13 14 15 16 17 18 19 20 



PC axis 



Figure S2: Distribution of support for BM, OU and EB models when the 
generating model is an uncorrelated multivariate EB model. Support for 
models were transformed into a linear scale by calculating an overall model 
support statistic: AICu>ou ~ AICweb- Thus high values support OU, low 
values support EB, and intermediate values near o indicate BM-like evo- 
lution. Models were fit to each replicated dataset for each of 20 different 
traits which were taken either from PC scores (blue line) or phylogenetic 
PC scores (green line). Shaded regions indicate the 25^ and 75*" quantiles 
of the model-support statistic for 100 replicated datasets. The red line indi- 
cates the average model support statistic averaged over all 20 original trait 
variables. Note that both pPCA and PCA increase support for EB models 
for early PC axes. 
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Figure S3: Estimated values of the a parameter from phylogenetic PC A 
when data is simulated under an uncorrelated multivariate OU model. The 
simulating value oc = 2 is depicted with the red line. The estimate of a is 
inflated in the first few pPC axes consistent with an exaggerated support 
for the OU model. In the last pPC axes, a is estimated to be very close to o, 
such that the OU model is statistically indistinguishable from a BM model. 
These results mirror those depicted in Figure 3. 
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Figure S4: Proportion of support for BM, OU and EB models for each of 
the traits/PC axes from the morphological dataset of Felidae species. Traits 
were log transformed prior to analysis. Note that all original traits and the 
first axes under standard and phylogenetic PCA show strong support for a 
BM model. 



33 



Downloaded from http://biorxiv.org/ on September 18, 2014 



Node height test 


Disparity through time 




~- 




PCA 












-o 

3" 






logenetic PCA 













Time 



Figure S5: Node height test and disparity through time plots for the mor- 
phological dataset of Felidae species. Each line represents a best-fit linear 
model (left) or loess curve fitted (right) to the original traits, PC or pPC 
scores. All traits were log transformed prior to analysis. The intensity of 
color is proportional to the ranking of the PC or pPC axes, stronger lines 
represent the first axes. Left panels show the relationship between the av- 
erage phylogenetic independent contrasts and the height of the node. Red 
lines indicate the average value for the original trait values. Right panels 
show disparity through time plots. The plots are oriented so that the left 
side of each panel corresponds to the root of the phylogeny, with time in- 
creasing tipward to the right. Compare this highly correlated dataset with 
only 7 traits to the larger, less correlated dataset of Anolis lizards (Figure 
S6). 
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Figure S6: Node height test and disparity through time plots for the mor- 
phological dataset of Anolis lizards. Each line represents a best-fit linear 
model (left) or loess curve fitted (right) to the original traits, PC or pPC 
scores. All traits were log transformed prior to analysis. The intensity of 
color is proportional to the ranking of the PC or pPC axes, stronger lines 
represent the first axes. Left panels show the relationship between the av- 
erage phylogenetic independent contrasts and the height of the node. Red 
lines indicate the average value for the original trait values. Right pan- 
els show disparity through time plots. The plots are oriented so that the 
left side of each panel corresponds to the root of the phylogeny, with time 
increasing tipward to the right. 
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